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Abstract 



Some scientific publications are under suspicion of fabrication of 
C$ [ data. Since humans are bad random number generators, there might 

^ ■ be some evidential value in favor of fabrication in the statistical results 

as presented in such papers. In line with Uri Simonsohn (2012, 2013) 
we study the evidential value of the results of an ANOVA study in 
^. | favor of the hypothesis of a dependence structure in the underlying 

OO ■ data. 

On 

^ ■ 1 Evidential Value in Forensic Statistics 

At some crime scene a trace has been found that links a suspect to the crime. 
In the court case the prosecutor puts forward the hypothesis H p that the 
suspect is the donor of the trace. The defendant claims the hypothesis H^ 
holds, which states that an unknown person, not the suspect, is the donor 
rS ■ of the trace. The juror (judge, jury) has to decide in favor of H p or H^. An 

important current scientific approach to such criminal court cases is via the 
so-called Bayesian Paradigm of Forensic Statistics. 

Within this paradigm the juror has to construct a prior opinion about H p 
and Hd- This means that the juror has to decide beforehand, before seeing 
the evidence, how likely the hypothesis of the prosecutor is in comparison 
to the hypothesis of the defendant. This prior opinion might be based on 
e.g. the number of possible offenders, and it may be formulated in terms of 
the prior odds in favor of the hypothesis of the prosecutor, namely 

P(H p )/P(H d ). 



The evidence in such a court case consists of the trace found at the crime 
scene and characteristics of the suspect. Let us denote it by E. The forensic 
expert has to determine now the probability that a randomly chosen person 
would leave a trace like the one found, at the crime scene. This probability 
is denoted by P(E \ H d ). Likewise he has to determine P(E \ H p ), the prob- 
ability that the suspect would leave a trace like the one found, at the crime 
scene. The ratio 

P(E\H p )/P(E\H d ) 

is called the likelihood ratio. Multiplying the prior odds and the likelihood 
ratio the juror obtains the so-called posterior odds in favor of the hypothesis 
of the prosecutor 

P(H p \E)/P(H d \E), 

i.e., the odds in favor of H p after having seen the evidence. The juror has 
to base his decision on these posterior odds. In summary, the Bayesian 
Paradigm of Forensic Statistics reads as follows 



likelihood ratio 

P(H P ) ^PjE^Hp) _ P(H p | E) 

P(H d ) P(E\H d ) P(H d \E) 

prior odds posterior odds 



(1) 



The validity of equation ([T]) may be checked straightforwardly by applying 
the definition of conditional probability, which is 

P{A\B) = P{AnB)/P{B), 

where A n B is the intersection of A and B. Since the likelihood ratio in 
dU may be interpreted as the weight that the evidence should have in the 
decision of the juror, it is often called the evidential value in forensic science. 
The evidence E is viewed here as a realization of a random mechanism, 
both under H d and H p . In case this random mechanism produces outcomes 
via probability density functions f(E \ H p ) and f(E \ H d ), the probabilities 
in the likelihood ratio or evidential value are replaced by the corresponding 
probability density functions, resulting in 

likelihood ratio 



P(Hp) f(E\H p ) P(H P \E) 

P(H d ) f(E\H d ) P(H d \E) " l ' 

prior odds posterior odds 



2 Modelling Fabrication of Data Underlying an 
ANOVA Study 

In Analysis of Variance the basic assumption is that all observations may be 
viewed as realizations of independent normally distributed random variables 
with the same variance a 2 and with means that depend on the values of some 
categorical covariates. Let / be the total number of cells that are defined 
via these categorical covariates, and let the number of observations per cell 
be the same, namely n. The random variables denoting the observations are 
then 

Xij = Hi + £ij, i = I,..., I, j = l,...,n. (3) 

The cell means pi are unknown real numbers, and the measurement errors 
£ij are independent, normally distributed random variables with mean and 
variance a 2 . 

If authors are fiddling around with data and are fabricating and falsifying 
data, they tend to underestimate the variation that the data should show 
due to the randomness within the model. Within the framework of the above 
ANOVA case, we model this by introducing dependence between the normal 
random variables £ij, which represent the measurement errors. Actually, we 
assume that the measurement errors in any cell have correlation coefficient 
p with respect to the corresponding measurement errors in the other cells. 
More precisely formulated, we assume that the correlations between the 
random variables £jj no longer all vanish, but satisfy 

p(£ij,£hj) = P, j = 1, • • • , n, < i ^ h < I, (4) 

with all other correlations still being equal to 0. In the sequel we restrict 
attention to nonnegative values of p and we exclude p = 1 for technical 
reasons, so < p < 1. We note that under the standard assumptions of 
ANOVA p = holds. Furthermore, we note that within cells observations 
may be renumbered in order to get the structure (H]). Nevertheless, we still 
assume (|3|) to hold and the measurement errors to be normally distributed 
with mean and variance a 2 . 

A way in which fabrication of measurement errors may take place is by 
copying some of them. This might be modelled as follows. Let Uj , j = 
1, . . . , n, and Vij , i = 1, . . . ,1, j = 1, . . . , n, be independent and identically 
distributed normal random variables with mean and variance a 2 . Inde- 
pendent of these, let the random indicators Ajj , i = 1, . . . , I, j = 1, . . . , n, 
be independent and identically distributed Bernoulli random variables with 
P(Aij = l) = y fp and P(A 4J = 0) = 1 - y'p. Then 

£ij = AijUj + (1 - Aij)Vij , i = 1, . . . , I, j = 1, . . . , n, (5) 

satisfy Q and ([3]). Note that for < i ^ h < I we have £ij = £hj = Uj with 
probability ^fp = p then, and the measurement errors satisfy ([!]). 



Finally, we note that ([H) is just one possible way to model dependence, 
and that the actual way in which fabrication has been implemented, might 
lead to quite different dependence structures. However, this model will come 
close to some types of fabrication and falsification. 



3 Evidential Value for Fabrication of Data Under- 
lying an ANOVA Study 

Consider a study in a scientific research paper. The data in this study are 
analyzed by ANOVA and presented via the sample averages of the cells and 
the values of some F-statistics. The underlying data themselves are not 
published and are not available. The conclusion of this study is that the 



K 

k=l J k 



I), such that 



I cells can be grouped into K groups of I k cells (Yl 

(possibly after renumbering of the cells) group k consists of cells % = L k -i + 
l,...,L k ,k = l,...,K, with = L < L\ < ■ ■ ■ < L K = I, L k - L k _ x = I k , 
and such that for each group the population cell means are the same, i.e., 



Hi 



Vk, 



L 



fc-i 



+ l,...,L fe , k = l,...,K, 



(6) 



for some values v k , k = 1, . . . , K. 

There are two hypotheses to be formulated about the data underlying 
the ANOVA study. The hypothesis H p of fabrication of the data underlying 
the results presented in the paper, is < p < 1. The other hypothesis Hd 
represents the situation that data have been collected according to ([3]) with 
independent Xij, i.e., p = 0. We want to determine the evidential value of 
the ANOVA study, i.e., of the sample means of the cells and the published 
F-statistics, with respect to these hypotheses H p and H^- 

To this end we first note that the sample averages in the cells, 



X, 



1 - 
n *— ' 



1. 



,/, 



(7) 



have a joint /-dimensional multivariate normal distribution. Actually, the 
dependence structure dH) implies 

P 



(X X \ 



/>A 



■N 



a 2 n 



\XiJ \W 



P 

1 



P 



(8) 



\p p ■ i/7 



In stead of assuming @, we could have started right away from 
Since the inverse of the covariance matrix in ([8]) equals 



<r>(l-p)(l+(I-l)p) 



(\ + {I-2)p -p 

-p l + (/-2)p 



V 



-p 
-p 



(9) 



1 + (I - 2)pJ 



and the determinant of na 2 times this covariance matrix equals 



1 P 
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P P 
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p-\ \-p 
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(10) 



l + (/-l)p p 

\- p 



p 













{l + (I-l)p)(l-p) 



I-l 



and 




\-p 
entail that the joint density of X±., . . . , Xj. equals 

7-11-1/2 



[(i + (i-i)p)(i- P y 

-. K L k 

— p E E (*<--*) s 



2^2 
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fc=l j=L fc _ 1 +l 
P 



(l + (J-l)p)(l-p) 



E E » 

fc=l«=L fc _i + l 



^fcj 



This density depends on the parameters p, a 2 , v\, . . . , vk- If the underlying 
data would be available their mean square error 



1 In 

7(^i)EE(^-^ 

V ; i=\ 7 = 1 



(12) 



would be the proper unbiased estimator of a 2 . The distribution of this esti- 
mator depends on p, but its mean does not. Furthermore, standard ANOVA 
theory shows that this estimator is independent of the exponent in (jlip . 
Since the underlying data are not available, the value of the parameter a 2 
should be retrieved from the values of the F-statistics given. For a method 
to do this that does not depend on p, see the next section. Let us call the re- 
sulting estimate <r 2 , and let us denote the density from (fTTj) with a replaced 
by <7„ by f n (X 1 .,...,X I .;u 1 ,...,u K ,p). 

The hypothesis H p of fabrication of the data corresponds to the pa- 
rameter values < p < 1 and v\,...,vjc arbitrary, and the hypothesis Hd 
of proper data corresponds to the parameter values p = and vi, . . . , uk 
arbitrary. The evidential value 

f(E\H p ) 
f(E\H d ) 



from d2|) in favor of H p versus H^ becomes in this case (cf. Zhang (2009), 
Bickel (2012)) 



V: 



su Po<p<i, u lt ...,u K eM fn(Xi.,. ..,Xi.;ui,...,vk,p) 



su P^,...,^gk fn{Xh, ■ ■ -,Xi.; vi,..., v K ,0) 
Straightforward computation shows that for any p 



sup f n (X 1 .,...,X I .\v 1 ,...,v K ,p) 

Vl,...,V K &. 



is attained at 

Vk = Xk 



1 



Lk — L 



fc-i 



E Xi. , k = 1, 



K. 



i=L k _ x +l 

This implies that the evidential value from (fT3|) reduces to 



with 



V= sup Xn{p) 

0<p<l 



7-11-1/2 



Xn( /3 )=[(l + (/-l)p)(l-p) / - 1 ] 



cxp 



rip 



K L k 



We need the additional notation 



E E (*«•-**)' 



+i 



£>,, 



7^2 



# 7fe 

E E C**--**) 1 

fc=li=L fc _ 1 +l 



1 



1-Sn) 



1 + 4 1 



4S„ 



(i-l)(l-Sn) 2 



In Proposition IA.1I of the Appendix the following is shown. 
• If 



On. ^L 



VI -1 



(13) 



(14) 



(15) 



(16) 



(17) 
(18) 



holds, then the evidential value from (fTB"j) and (fT5"j) reduces to V = 1. 



• If 



S n < 



y/l-1 



VI+i 

holds, then p n is well-defined and the evidential value from f)13|> and 
(fT5"j) reduces to 

V = max{ X n(/5„),l}. (19) 
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4 Estimating a 2 from F-Statistics 

Table 1 in Stapel, Koomen and Van der Pligt (1996) presents the sample 
means in a three-way layout ANOVA study with a 3 x 2 x 2 design. 



Prime type 


Positive 


Negative 


Irrelevant 


Impersonal / Memory 


2.3 


3.5 


2.9 


Impersonal / Impression 


3.4 


2.5 


2.9 


Personal / Memory 


3.3 


2.3 


2.9 


Personal / Impression 


3.5 


2.5 


3.0 



The estimate of the error variance a 2 is not given. It should be possible to 
retrieve this estimate from the value of any F-statistic. On page 441 of ibid, 
the value of the F-statistic for testing three-way interactions is given, namely, 
-F(2,326) = 3.21. We assume that the 338 observations are approximately 
uniformly distributed over the 12 cells. This yields an average of 28.17 
observations per cell. Applying e.g. Table 4.5.2 (Analysis of Variance of 
the Three-Way Layout with M Observations per Cell) of Scheffe (1959) 
we obtain by some computation that the mean square error for interaction 
equals 7.769. Dividing this by the value 3.21 of the F-statistic we get 2.420 
as the mean square for error, i.e., the estimate for a 2 . However, this is not 
the value that we would have gotten, would we have used the underlying 
observations, since the cell means, which are used in the above computation, 
are given in very low precision. 

In an ANOVA of the upper half of Table 1 in ibid, the two way inter- 
action terms are tested by an F-statistic with value F{2, 164) = 14.28. By 
Table 4.3.1 of Scheffe (1959) a similar computation as above yields 1.095 as 
the value of the mean square for error, based on 170 observations. 

Note that a value like 2.3 for a cell mean implies that the actual value of 
the cell mean lies in the interval [2.25, 2.35). Using this rounding off property 
we may conclude that the first three F- values given on page 442 of ibid., 
which have 1 and 164 degrees of freedom, imply that the value of the mean 
square for error, based on 170 observations, lies in the interval [0.918, 1.218]. 
Note that 1.095 belongs to this interval. Averaging the values of the mean 
square for error that we get from the last four F-values, we obtain 1.047 as 
our estimate. 

The F-values presented on page 442 of ibid, that are based on the second 
half of Table 1 of ibid., namely F(2, 162) = 11.49 and F(l, 162) = 23.00, 
yield 1.223 and 1.217 as value of the mean square for error, based on the 
remaining 168 observations. Averaging yields 1.220. Pooling 1.047 and 1.220 
we obtain 



u. n 



1.134 



as our final estimate for a 2 . Note that this deviates considerably from the 
value 2.42, which has been obtained from the F-value 3.21 for three-way 



interaction. Let us presume here that this is a misprint and that this F- 
value should have been something like 6.9. 

In order to take care of the rounding off of the values of the cell means, 
we have adapted Table 1 in a direction that increases the double sum in 
(|16p as much as possible and that should decrease the evidential value. The 
resulting table is given below. 



Prime type 


Positive 


Negative 


Irrelevant 


Impersonal / Memory 


2.25 


3.55 


2.85 


Impersonal / Impression 


3.35 


2.55 


2.85 


Personal / Memory 


3.25 


2.25 


2.85 


Personal / Impression 


3.55 


2.55 


3.05 



Analyzing the same F-statistics as above and performing the same com- 
putations we see that the F-statistics for interactions yield exactly the same 
values for the mean square for error. Only the three F-statistics of the type 
F(l, 164) yield different values. Averaging the four values for the mean 
square for error that we get out of the four F-values related to the upper 
half of the table, we arrive at 1.117. The F-statistics for the second half of 
the table yield the same estimate 1.220. Pooling 1.117 and 1.220 we obtain 



^ 2 



1.168 

as our final estimate for a 2 based on our version of Table 1 of ibid. 



5 Computing Evidential Value 

Let us group the cells of the tables in the preceding section into three groups, 
namely the groups corresponding to the covariate Prime type with the first 
two cells in the row Impersonal / Memory interchanged; I = 12, K = 3, I\ = 
I2 = I3 = 4. According to the social psychology theory as put forward in 
Stapel, Koomen and Van der Pligt (1996), the participants within these 
groups should have similar mean scores. By (|15p through (|19p we may 
compute the evidential value V in favor of the hypothesis H p that these 
data have been fabricated in some way resulting in (|8|) with < p < 1. For 
the first table from the preceding section, i.e., Table 1 from ibid., this yields 

V = 56.88 



and for the second, adapted table from the preceding section this yields 

V= 1.92. 



6 Interpreting Evidential Value 



With the evidential value V defined as in (|15|) through (|19|) the Bayesian 
paradigm for criminal court cases ([2]) becomes 

evidential value 

PiH^ ^ _ P(H p \E) 

prior odds posterior odds 

An important principle in criminal court cases is 'in dubio pro reo', which 
means that in case of doubt the accused is favored. In science one might ar- 
gue that the leading principle should be 'in dubio pro scientia', which should 
mean that in case of doubt a publication should be withdrawn. Within the 
framework of this paper this would imply that if the posterior odds in favor 
of hypothesis H p of fabrication equal at least 1 , then the conclusion should 
be that H p is true. So an ANOVA study for which 

evidential value 
prior odds posterior odds 

holds, should be rejected and disqualified scientifically. 
We conclude with some notes. 

• ANOVA studies are based on the assumption of normality. Often this 
assumption is not satisfied, but the technique is still applied. This is 
the case in Stapel et al. (1996), since in Table 1 of ibid, the measure- 
ments are averages of two 7 point Likert scales, which hardly behave 
like normal random variables. However, in view of the central limit 
theorem cell means like in our basic model (|8|) behave approximately 
like (jointly multivariate) normal random variables. 



• Note that flUD implies 

V> 1. 

Consequently, within this framework there does not exist exculpatory 
evidence. This is reasonable since bad science cannot be compensated 
by very good science. It should be very good anyway. 

• When a paper contains more than one study based on independent 
data, then the evidential values of both studies can and may be com- 
bined into an overall evidential value by multiplication in order to 
determine the validity of the whole paper; see the preceding item. 



• One may wonder if the way in which the mean square error (|12p is 
retrieved from the values of F-statistics, interferes with the randomness 



in (|13p . As mentioned in Section[3]standard AN OVA theory shows that 
this estimator is independent of the exponent in (jlip and hence (|13p. 
provided the underlying data have a normal distribution; see also item 
1. 



7 Evidential Value for Fabrication of Data Under- 
lying an ANOVA Study Based on an Alternative 
Dependence Structure 

In this Section we present an analysis as in Sections [2] and [3j but under a 
different dependence structure. Given the group structure of the cells as 
presented in the first paragraph of Section [3] we assume the existence of 
pi,... , pk £ [0, 1] such that 

p(eij,e h j) = Pk, J = l,...,n, L fc _i + 1 < % / h < L k , k = l,...,K, 

(22) 
hold with all other correlations being equal to 0. This implies independence 
between different groups of cells. We note that ([22]) is just another possible 
way to model dependence, and we note again that the actual way in which 
fabrication has been implemented, might lead to quite different dependence 
structures. 

We reconsider the ANOVA study presented via the sample averages of 
the cells and the values of some F-statistics. Again the underlying data 
themselves are not published and are not available, and the conclusion of 
this study is given by ©. There are two hypotheses to be formulated about 
the data underlying the ANOVA study. The hypothesis H p of fabrication of 
the data underlying the results, is that at least one of the p^s is positive. The 
other hypothesis H^ represents the situation that data have been collected 
according to (J3j) with independent Xy, i.e., pi = ••• = Pk = 0. We want to 
determine the evidential value of the ANOVA study, i.e., of the sample means 
of the cells and the published F-statistics, with respect to these hypotheses 
H p and H^. Here the evidential value is defined analogously to (|13|) with the 
supremum taken over < p^ < 1, k = 1, . . . , K. 

The sample averages in the cells, Xj. from ([7]), have a joint /-dimensional 
multivariate normal distribution with 



fx Lh _ 1+ i.\ ((y k \ 



\ X L k - ) 



a 2 n 1 



/I Pk ■ Pk\\ 
Pk 1 ■ Pk 



VW 



\pk Pk ■ i y 



(23) 



for each k = 1, . . . , K and with independence between groups with different 
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indices k. This entails that the joint density of X\., . . . , Xj. equals 

K 



^) //2 n[( i +^- i )^)( 1 -^) 4 " 1 ] 
fc=i 



-1/2 



ex P "53 E 



A 



fe=l 



1 



1-pjfe 



E (** - «*) 



(24) 



i=L fc _!+l 



(i + (4-i) PA )(l-p fc ) 



E (^ 



^j 



i = L ft _! + l 



This density depends on the parameters pi, ... , /9r-, cr 2 , i/i, . . . , i/^. Again, 
we write ct 2 ,, for the estimate of a 2 . 
With the notation 



exp 



np 



L k 



2*2(1 - p) 



E (^-**)' 



i=L fe _i+l 



n 



'n,fc 



Pn,fc 



4^ 



/ , (-Xi- - Xk) , 



i=L fc _!+l 



.1 <~>ra,/ 



1+4/1 



45„, 



(4 - l)(l - S n , fc ) 2 



(25) 



-[Sn,fe<(x^-l)/(V4 + l) 



Proposition IA.1I of the Appendix shows 



A' 



v = n Xn > k & 



n,k) 



(26) 



fc=i 



Computing this evidential value for Table 1 in Stapel, Koomen and Van 
der Pligt (1996), i.e., for the first table of Section HI we obtain 

V = 14.49. 

The adapted table, namely the second table of Section HI yields 

V= 1.28. 



Here and in (|13p we have defined the evidential value in the presence of 
the nuisance parameters v\ , . . . , vk by replacing these parameters by their 
maximum likelihood estimators. An alternative approach is to compute 
the evidential value keeping these parameters fixed, and to subsequently 
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minimize the resulting evidential value over these nuisance parameters; in 
formula 



V= inf 

ui,...,u K i 



sn Po< Pk <l,k=l,...,K fn(Xi. , . . . , X/.; Ux, . . . , uk, Pi,---, Pk) 



f n (X 1 .,...,X I .;vi,... ,u K ,0,...,0) 



(27) 



where f n (X\., . . . , Xj.; u\,..., vk, Pi, ■ ■ ■ , Pk) is the density as given in 
with a replaced by a n . In fact, both definitions of evidential value yield the 
same value in the situation of this Section as is shown in Theorem IB. 11 



A Appendix: Analysis x Function 

Here we present a proof of the main result of Section [3j 
Proposition A.l. In the notation Iil6\) and [11) and for I > 2 



sup Xn(p) = l[ Sn >(^r-i)/(vT+i)l +max{ X n(Pn),l} 1 [5„<(^I-1)/(^I+1) 
<P<1 



0<p< 

holds. 



(28) 



Proof 

Write ipn(p) = log (Xn(/o)) , < p < 1, and ip' n (p) for its derivative. Some 
computation shows that 



^n(0)=0, #,(0) = --IS n , 



(29) 



hold and that ip' n (p) is nonnegative on the interval [0, 1) if and only if both 

s n <(v7-i)/(v7 + i) 

and 



2(1-^) 



I-4/I 



45 n 



(i-i)(i-s n y 



(30) 



<P<^(l-S n ) 



1 + 4/1 



4S„ 



(i-i)(i-s n y 



Pn 



hold. Consequently, ip n (p) and Xn(p) have (local) maxima at p = and 
p = p n on [0, 1). This implies ([25]) . □ 
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B Appendix: Alternative Definition of Evidential 
Value 

The alternative definition (j27|) of evidential value yields the same value as 
(|13|) for the alternative dependence model as given in Section [71 

Theorem B.l. In the situation of Section^ the evidential values as defined 
by |73p and §Fty satisfy 

V = V. (31) 



su Po< P <i fn(Xi.,. ..,X I .;u 1 ,...,u K ,p) 



Proof 








First we 


note 




V 


_ 




inf 






in- 


...,V K 




< 




inf 






V\, 


...,U K 



fn(X\., . . . ,Xj.;vi, ...,i>k,0) 
su Po< P <i,ut,...,is* K m fn(Xv, ■ ■ ■ ,Xi.;u*,... ,v* K ,p) 



fn(X\., . . . ,Xi.; v\,... ,vk,$) 
su P0<p<l,u u ...,u K £R fn(Xl., . . . ,X I .;ui,...,v K ,p) 



sup 



v\,...,v K l 



f n (Xi., . . . ,Xi.;v!, . . . ,i/ K ,Q) 



V. 



(32) 



Subsequently, we note that by the product structure of (f2"4"|) it suffices to 
consider the case K = 1 in proving V > V. Furthermore, by (|24p with 
K = 1 we have 



su Po<p<i fn (Xi. , ■ ■ ■ , X h ; u, p) 
/„(*!.,..., A>.; i/, 0) 

= sup l(l + (I-l)p)(l- p ) 

0<p<l 



7-11-1/2 



exp 



np 



I» 2 n(l-P) 



£(*■ 



(33) 



.i=i 



With the notation X = I~ l X^i=i X%. we obtain 



(34) 



i=l 



13 



in view of p < 1. Together with (j33|) this inequality yields 



V > inf sup [(1 + (I- l)p)(l - p) 

v 0<p<l 



/-11-1/2 



cxp 



(-^g>.-4 



(35) 



Since the infimum over v may be removed from f)35[) . equations (|15f) and 
(|16p with i^ = 1 imply V > V, which completes the proof. □ 



C Appendix: ^-Statistic under Dependence 

If one would be interested in the distribution of the exponent in (|16|) or of 
S n from (|17p . the following lemma would come in handy. 



Lemma C.l. Let the correlated standard normal random variables Z\, 
have a joint multivariate normal distribution, namely 



Z 



(Zi\ 



W 



M 



( I 1 P 

o, 

V V 



p 1 

fi p 



p\\ 

p 



v 



,Zd 



(36) 



with < p < 1. Xei 

1 d ^ d 

z d = -^2Zi, sj = j— j y^ i z i - z*y 



(37) 



i=l 



j=l 



6e i/ieir sample mean and sample variance, respectively. 

Then Z& and S% are independent, Z^ has a normal distribution with mean 
and variance (1 + (d — l)p)/d, and (d — l)5j/(l — p) has a chi squared 
distribution with d—1 degrees of freedom. 

Proof 

The following classical trick for the case p = also works for positive p. Let 
A T be an orthogonal (orthonormal) matrix, the first row of which is the row 
vector (d ' 2 , . . . , d~ 1 ' 2 ). Define the column d-vector Y by Y = A Z, and 
note 
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(39) 



1-P/ 



where the matrix equalities hold because 4 i equals the identity matrix 
and because all row vectors of A T are orthogonal to its first row vector 
(d -1 / 2 , . . . , d -1 / 2 ), and hence to all multiples of (1, . . . , 1). Since (|3"9"j) is the 
covariance matrix of the multivariate normally distributed vector Y, it fol- 
lows that Y±, . . . , Yd are independent, and consequently, that Zd and S^ are. 
Finally, ([38|) and ([39]) imply that YJj , . . . , Yd are independent identically dis- 
tributed with a normal distribution with mean and variance 1 — p, which 
yields that (1 — p)~ x J2i=2 ^i nas a cn i squared distribution with d — 1 de- 
grees of freedom. □ 



We note that as a consequence the statistic 



d(l - p) Z d 



V 1 + (d - l)p 5 d 
has a t-distribution with d — 1 degrees of freedom. 



(40) 
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